Batch Tuning Strategies for Statistical Machine Translation
نویسندگان
چکیده
There has been a proliferation of recent work on SMT tuning algorithms capable of handling larger feature sets than the traditional MERT approach. We analyze a number of these algorithms in terms of their sentencelevel loss functions, which motivates several new approaches, including a Structured SVM. We perform empirical comparisons of eight different tuning strategies, including MERT, in a variety of settings. Among other results, we find that a simple and efficient batch version of MIRA performs at least as well as training online, and consistently outperforms other options.
منابع مشابه
Continuous Learning from Human Post-Edits for Neural Machine Translation
Improving machine translation (MT) by learning from human post-edits is a powerful solution that is still unexplored in the neural machine translation (NMT) framework. Also in this scenario, effective techniques for the continuous tuning of an existingmodel to a streamofmanual corrections would have several advantages over current batch methods. First, they would make it possible to adapt syste...
متن کاملTuning Statistical Machine Translation Parameters
Word alignment is the basis of statistical machine translation. GIZA++ is a popular tool for producing word alignments and translation models. It uses a set of parameters that affect the quality of word alignments and translation models. These parameters exist to overcome some problems such as overfitting. This paper addresses the problem of tuning GIZA++ parameter for better translation qualit...
متن کاملPhrasal: A Toolkit for New Directions in Statistical Machine Translation
We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like the bitext, and (c) web-based interactive machine translation. A direct comparison with Moses shows favorable results in terms of decoding s...
متن کاملThe Effect of Translationese on Tuning for Statistical Machine Translation
We explore how the translation direction in the tuning set used for statistical machine translation affects the translation results. We explore this issue for three language pairs. While the results on different metrics are somewhat conflicting, using tuning data translated in the same direction as the translation systems tends to give the best length ratio and Meteor scores for all language pa...
متن کاملStatistical Machine Translation System for IWSLT 2009
We describe the system developed by the team of the National University of Singapore for the Chinese-English BTEC task of the IWSLT 2009 evaluation campaign. We adopted a state-of-the-art phrase-based statistical machine translation approach and focused on experiments with different Chinese word segmentation standards. In our official submission, we trained a separate system for each segmenter ...
متن کامل